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Abstract. This paper proposes a mechanism for learning pattern correspondences be- 
tween two languages from a corpus of translated sentence pairs. The proposed mechanism 
uses analogical reasoning between two translations. Given a pair of translations, the sim- 
ilar parts of the sentences in the source language must correspond the similar parts of the 
sentences in the target language. Similarly, the different parts should correspond to the 
respective parts in the translated sentences. The correspondences between the similari- 
ties, and also differences are learned in the form of translation rules. The system is tested 
on a small training dataset and produced promising results for further investigation. 



1 Introduction 



Traditional approaches to machine translation (MT) suffer from tractability, scalability and per- 
formance problems due to the necessary extensive knowledge of both the source and the target 
languages. Corpus-based machine translation is one of the alternative directions that have been 
proposed to overcome the difficulties of traditional systems. Two fundamental approaches in 
corpus-based MT have been followed. These are statistical and example-based machine transla- 
tion (EBMT), also called memory-based machine translation (MBMT). Both approaches assume 
the existence of a bilingual text (an already translated corpus) to derive a translation for an in- 
put. While statistical MT techniques use statistical metrics to choose the most probable words 
in the target language, EBMT techniques employ pattern matching techniques to translate 
subparts of the given input |l| . 

Exemplar-based representation has been widely used in Machine Learning (ML). According 
to Medin and Schaffer |Q , who originally proposed exemplar-based learning as a model of human 
learning, examples are stored in memory without any change in the representation. Here, an 
exemplar is a characteristic example stored in the memory. The basic idea in exemplar-based 
learning is to use past experiences or cases to understand, plan, or learn from novel situations 

§§0- 

EBMT has been proposed by Nagao || as Translation by Analogy which is in parallel with 
memory based reasoning |Q , case-based reasoning |llj and derivational analogy Q . Example- 
based translation relies on the use of past translation examples to derive a translation for a 
given input ||, |9, 12 , p| E^. The input sentence to be translated is compared with the example 
translations analogically to retrieve the closest examples to the input. Then, the fragments 
of the retrieved examples are translated and recombined in the target language. Prior to the 
translation of an input sentence, the correspondences between the source and target languages 
should be available to the system; however this issue has not been given enough consideration 
by the current EBMT systems. Kitano js| has adopted the manual encoding of the translation 
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rules, however this is a difficult and an error-prone task for a large corpus. Wu (if]] uses a 
method to extract phrasal translation examples in sentence-aligned parallel corpora using a 
probabilistic translation lexicon for the language pair. Wu's inversion transduction grammar 
(ITG) formalism is used to model two languages simultaneously. In this paper, we formulate 
this acquisition problem as a machine learning task in order to automate the process. 

In this paper, we propose a technique which stores exemplars in the form of templates that are 
generalized exemplars. A template is an example translation pair where some components (e.g., 
word stems and morphemes) are generalized by replacing them with variables in both sentences, 
and establishing bindings between the variables. We will refer this technique as GEBMT for 
Generalized Example Based Machine Translation. We assume no grammatical knowledge about 
languages except morphological structure of some words in the languages. 

The algorithm we propose here, for learning such templates, is based on a heuristic to 
learn the correspondences between the patterns in the source and target languages, from two 
translation pairs. The heuristic can be summarized as follows: Given two translation pairs, if the 
sentences in the source language exhibit some similarities, then the corresponding sentences in 
the target language must have similar parts, and they must be translations of the similar parts 
of the sentences in the source language. Further, the remaining different parts of the source 
sentences should also match the corresponding differences of the target sentences. However, if 
the sentences do not exhibit any similarity, then no correspondences are inferred. Consider the 
following translation pair given in English and Turkish to illustrate the heuristic: 

I give+PAST the book to Mary 

+-> Mary+DAT kitap +ACC ver+PAST+lSG 
I give+PAST the pencil to Mary 

<-> Mary+DAT kur§un kalem +ACC ver+PAST+lSG 



Similarities between the translation examples are shown as underlined. The remaining parts 
are the differences between the sentences. We represent the similarities in the source language 
as "I give+PAST the X s to Mary", and the corresponding similarities in the target language 
as "Mary+DAT X T +kCC ver+PAST+lSG". According to our heuristic, these similarities should 
correspond each other. Here, X s denotes a component that can be replaced by any appropriate 
structure in the source language and X T refers to its translation in the target language. This 
notation represents an abstraction of the differences {book vs. pencil} and {kitap vs. kur§un 
kalem} in the source and target languages, respectively. Using the heuristic further, we infer 
that book should correspond to kitap and pencil should correspond to kur§un kalem; hence 
learning further correspondences between the examples. 

Our learning algorithm based on this heuristic is called TRL [Translation Rule Learner). 
Given a corpus of translation cases, TRL infers the correspondences between the source and 
target languages in the form of translation rules. These rules can be used for translation in 
both directions. Therefore, in the rest of the paper we will refer these languages as L\ and £2- 
Although the examples and experiments herein are on English and Turkish, we believe that the 
model is equally applicable to other language pairs. 

The rest of the paper is organized as follows. Section^ describes the underlying mechanisms 
of TRL, along with sample rule derivations. Section [| gives more learning examples. Section^ 
illustrates the translation process using translation rules. Section concludes the paper. 



2 Learning 



Our learning algorithm TRL infers translation rules using similarities and differences between a 
pair of translation examples (Ei, Ej) from a bilingual corpus. A translation example E is also a 
pair (E Ll <-> E L2 ) where E Ll and E L2 are equivalent sentences in languages L\ and L^. Using 
a matching algorithm, we find a match sequence M Ll representing similarities and differences 
in E^ 1 and E^ 1 , a match sequence M L2 for Ef 2 and £f 3 . Fr om these two match sequences, 
we learn translation rules. 

In our examples, we will use translation examples between English and Turkish. A trans- 
lation example consists of an English sentence and a Turkish sentence. We will use the lexical 
level representationa for each sentence in our examples. For example, the English sentence "I 
broke a pencil" will be represented by 

I break+PAST a pencil 

and its equivalent Turkish sentence "Bir kur§un kalem kirdim" will be represented by 
Bir kursun kalem kir+PAST+lSG 

For a pair of translation example ((E^ 1 <-> i?^ 2 ),(i?|' 1 <-> E^ 2 )), the matching algorithm 
produces match sequences M Ll and M L2 to represent similarities and differences in examples 
in languages L\ and L2, respectively. A match sequence M for two different sentences will be 
in the following form. 

Si Di S2 ••• D n SVi+i where n > 1 

In that sequence, each Si represents a similarity between sentences. In other words, it is a 
substring which is common in both of those sentences. Each Di represents a difference which 
is a pair of non-empty substrings of sentences, one from the first sentence and the other from 
the second sentence. For each difference D\ : Df, D\ and D\ do not contain any common item. 
Also, no lexical item in a similarity Si appear in any previously formed difference for k < i. 
Any of 5*i or S n +i can be empty, however, 5, for 1 < i < n + 1 must be non-empty. These 
restrictions guarantee tha there exists either a unique match or no match between two different 
examples. 

For example, in the following translation examples 

it is a book o bir kitap+CDP 

it is a pencil <-> o bir kur§un kalem +COP 

similarities are underlined and differences are not. The match sequence for English sentences 
will be 

it is a book: pencil 

Note that we have one similarity and one difference between English sentences. The matching 
sequence for Turkish sentences will be 

2 In our examples, PAST, AQR, PRG, FUT denote past, aorist, progressive and future tenses, CDND, NEC 
denote necessitative and conditional, ACC, DAT, LOC, ABL denote accusative, dative, locative and ab- 
lative case markers for nouns, 1SG, 2SG, 3SG denote first, second and third singular verbal agreements, 
COP denotes copula in verbs. 



o bir kitap :kur§un kalem +COP 



where we have two similarities and one difference . 

In the example above, the difference in English sentences must correspond to the difference 
in Turkish sentences, and similarities in them must correspond to each other in that context. 
TRL can learn the following translation rules from differences and similarities in that example. 

book <-» kitap 

pencil <-» kur§un kalem 

it is X E <-> o bir X T +C0P where X E is a translation of X T 

First two rules are learned from differences in English and Turkish sentences, namely book : pencil 
and kitap :kur§un kalem. The last rule is learned from similarities in the example. In addition 
to these three learned rules, we also put two translation rules directly given in the example into 
our learned rule database. Of course, they are more specific forms of the third learned rule. 
We order rules from the most specific to the least specific in the database. During translation, 
the first applicable specific rule will be used for the translation of a sentence as a result of this 
ordering. 

When the number of differences in two match sequences M Ll and M h2 of a pair of trans- 
lation examples is greater than 1, say n, the learning algorithm only learns new rules if n — 1 
differences can be resolved using already learned rules from previous examples. Otherwise, the 
current version of the algorithm cannot learn new rules. From the following example, 

I give+PAST the book <-> Kitap +ACC ver+PAST +lSG 

You give+PAST the pencil <-> Kursun kalem +ACC ver+PAST +2SG 

we will get the following match sequences. 

M E = I: You give+PAST the book: pencil 

M T = Kitap : Kursun kalem +ACC ver+PAST +1SG:+2SG 

Both M E and M T have two differences. If we had not learned anything before this example, 
there is no way to know whether the difference I : You in English sentences corresponds to the 
difference Kitap :Kur§un kalem or +1SG:+2SG in Turkish sentences. Now, let us assume that 
we have already learned the following translation rules from some previous examples. 

book «-> Kitap 
pencil +-> Kur§un kalem 

Since we now know that the difference book : pencil corresponds to the difference kitap : kur§un 
kalem, the difference I: You must correspond to the difference +1SG:+2SG. Thus, we can learn 
the following new translation rules from this example. 

I <-» +1SG where X E is a translation of X T 

You <-» +2SG and Y E is a translation of Y T . 

X E give+PAST the Y E <-> Y T +ACC ver+PAST X T 

For a given pair of translation examples, {{E^ 1 <-> E^ 2 ), (E^ 1 ^ E^ 2 )), the algorithm of the 
translation rule learner (TRL) for this pair is given in Figure 1. In that algorithm, first we find 
match sequences M Ll and M L2 for sentences in languages L\ and L 2l respectively. Then, we 



try to reduce the number of differences in these match sequences to one. At the same time, we 
construct Condition which is a conjunction of translation goals for a translation rule which will 
be learned later in the algorithm. After this reduction, each of our match sequences will have 
exactly one difference. So, these unlearned differences must correspond to each other. From this 
fact, we learn three translation rules given at the end of the algorithm. In the implementation, 
each learned translation rule is represented in the form of a Prolog fact or rule. 



Let ((£-[ 1 E L L 2 ), (E^ 1 <-> E; 2 ) be a pair of translation examples. 
M Ll <- match(E^ , E^ 1 ); 
M L2 *- match{E^ 2 ,E^ 2 ); 

if #ofSimilarity(M Ll ) = or #of Similarity (M L2 ) = then exit; 
if #ofDifference{M Ll ) = or 

#ofDifference(M Ll ) / #ofDif ference(M L2 ) then exit; 
Condition <— ""; 

while #ofDifference(M Ll ) > 1 do 
begin 

if there exists a D Ll in M Ll and a D L2 in M L2 such that 

the correspondence of D Ll to D h2 has been already learned then 
begin 

Replace D Ll in M Ll with a new variable X i 1 ; 
Replace D h2 in M L2 with a new variable Xf 2 ; 
Add "X^ 1 <-» X^ 2 and" to the end of Condition; 
i <- i + 1; 
end 
else e:ri£; 
end 

Let D 2 ' 1 in M Ll and D i2 in M h2 be unlearned differences such that 

D Ll is Df 1 : D2 1 and D i2 is D\ 2 : D% 2 ; 

Replace D Ll in M Ll with a new variable Xf 1 ; 

Replace D h2 in M h2 with a new variable Xf 2 ; 

Add "X^ 1 <-> xf 2 " to the end of Condition; 

Learn the following translation rules: 

Df 1 <-> £>f 2 

L»2 1 <-> £>^ 2 

M Ll M L2 i/ Condition 

Figure 1. Translation Rule Learner Algorithm For Two Translated Sentence Pairs 



3 Examples 

In order to evaluate the TRL algorithm we have developed a sample bilingual parallel text. In 
this section, we will illustrate the behavior of TRL on that sample text. 

Example 1: Given the example translations "I saw you at the garden" <-> "Seni bahcede 
gordum" and "I saw you at the party" <-> "Seni partide gordiim" , their lexical level represen- 
tations are 



i see+PAST you at the garden <-> sen+ACC bahce+LOC gor+PAST+lSG 
i see+PAST you at the party <-► sen+ACC parti+LOC gor+PAST+lSG 



From these examples, the following translation rules are learned: 

i see+PAST you at the Xf +-> sen+ACC Xf +L0C gor+PAST+lSG 

if Xf <-» Xf 
garden <-> bahge 
party <-+ parti 

Example 2: Given the example translations "It is raining" «-> "Yagmur yagryor", "He comes" 
<-> "Gclir" , "If it is raining then you should take an umbrella" <-> "Eger yagmur yagiyorsa bir 
semsiye almahsin" and "If he comes then we will go to the theater" <-+ "Eger gelirse tiyatroya 
gidecegiz" , their lexical level representations are 

it is rain+PRG <-> yagmur yagi+PRG 
He come+AOR <-> gel+ADR 

if it is rain+PRG then you should take an umbrella 

<-> eger yagmur yagi+PRG+CDND bir §emsiye al+NEC+2SG 

if he come+ADR then we will go to the theater 

<-> eger gel+AOR+COND tiyatro+DAT git+FUT+IPL 



From the last two examples using first two examples, the following translation rules are learned: 

if Xf then Xf eger Xf+COND Xj 

if Xf <-> Xf and Xf ^ X 2 T 
you should take an umbrella bir §emsiye al+NEC+2SG 
we will go to the theater <-> tiyatro+DAT git+FUT+IPL 

Example 3. Given the example translations "I went" +-> "gittim" , "you went" +-> "gittin" and 
"I came" <-> "geldim" , their lexical level representations are 

i go+PAST <-> git+PAST+lSG 
you go+PAST ^ git+PAST+2SG 
i come+PAST <-> gel+PAST+lSG 

From the first and second examples where differences are i:you and +1SG:+2SG, the following 
translation rules are learned: 

Xf go+PAST <-> git+PAST Xf 

ifXf^Xf 
i <-> +1SG 
you <-► +2SG 

And from the first and third examples where differences are go : come and git : gel, the following 
translation rules are learned: 



i Xp+PAST Jff+PAST+ISG 

if Xf «-> Xj 
go <-► git 
come <-> gel 



4 Translation 

The translation rules learned by the TRL algorithm can be used in the translation directly. The 
outline of the translation process is given below: 

1. First, the lexical level representation of the input sentence is derived. 

2. The most specific matching translation rule is found for the input sentence. If the template 
for the language of the input sentence in a translation rule matches the input sentence, we 
call it a matching rule. During this matching, certain variables in the template can bind to 
substrings of the input sentence. Then, translations for these bound variables are sought. 
Thus, we will get the lexical level representation of the output sentence if these processes are 
successful. The most specific matching rule contains maximum number matching terminals 
and minimum number of variables. 

3. The surface level representation of the output sentence obtained in the previous step is 
generated. 

Note that, if the input sentence in the source language is ambiguous, then templates cor- 
responding to each sense will be retrieved, and the sentences for each sense will be generated. 
The translation rules learned by TRL can be used for translation in both directions. 

5 Conclusion 

In this paper, we have presented a model for learning translation rules between two languages. 
We integrated this model with an example-based translation model into Generalized Exemplar- 
Based Machine Translation. We have implemented this model as the TRL (Translation Rule 
Learner) algorithm. The TRL algorithm is illustrated in learning translation rules between 
Turkish and English. 

The major contribution of this paper is that the proposed TRL algorithm eliminates the 
need for manually encoding the translations, which is a difficult task for a large corpus. The 
TRL algorithm can work directly on surface level representation of sentences. However, in order 
to generate useful translation patterns, it is helpful to use the lexical level representations. It 
is usually trivial, at least for English and Turkish, to obtain the lexical level representations of 
words. 

Our main motivation was that the underlying inference mechanism is compatible with one 
of the ways humans learn languages, i.e. learning from examples. We believe that in everyday 
usage, humans learn general sentence patterns, using the similarities and differences between 
many different example sentences that they are exposed to. This observation lead us to the idea 
that a computer can be trained similarly, using analogy within a corpus of example translations. 

The accuracy of the translations learned by this approach is quite high with ensured gram- 
maticality. Given that a translation is carried out using the rules learned, the accuracy of the 
output translation critically depends on the accuracy of the rules learned. 



We do not require an extra operation to maintain the grammaticality and the style of the 
output, as in Kitano's EBMT model f|. The information necessary to maintain these issues is 
directly provided by the translation rules. 

The model that we have proposed in this paper may be integrated with an intelligent tutoring 
system (ITS) for second language learning. The rule representation in our model provides a 
level of information that may help in error diagnosis and student modeling tasks of an ITS. The 
model may also be used in tuning the teaching strategy according to the needs of the student by 
analyzing the student answers analogically with the closest cases in the corpus. Specific corpora 
may be designed to concentrate on certain topics that will help in student's acquisition of the 
target language. The work presented by this paper provides an opportunity to evaluate this 
possibility as a future work. 
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